In [1]:
intro_images_path='./Images/Intro_images/'
edge_images_path='./Images/Edge_images/'
seg_images_path='../'#../Images/Seg_images/'
feature_images_path='../'#../Images/Feature_images/'
output_path='./Images/Outputs/'
print('Image paths ....')
Image paths ....

COMPUTER VISION I¶

Master in Artificial Intelligence, USC, UDC, UVigo

Academic year 2024/25

No description has been provided for this image

1 Edge Detection¶

  • Image gradients
  • Noise effect and smoothing
  • Derivative of Gaussian filter
  • Canny edge detector
  • Lapacian
  • Hough transform

http://szeliski.org/Book/

Exercise Image gradients¶

OpenCV provides three types of derivate (high-pass) filters: Sobel, Scharr and Laplacian. Repeat the first exercises on this page: https://docs.opencv.org/4.3.0/d5/d0f/tutorial_py_gradients.html https://docs.opencv.org/3.4.2/d5/d0f/tutorial_py_gradients.html

First deriative filters: Sobel and Scharr¶

The Sobel operators combine Gaussian smoothing and differentiation, so the result is more or less resistant to noise.

dst= cv.Sobel(src, ddepth, dx, dy, [, ksize[, scale[, delta[, borderType]]]])

You can specify the direction of derivatives to be taken, vertical or horizontal( passing values {0,1} as arguments of dx and dy respectively). You can also specify the size of kernel by the argument ksize. If ksize = -1, a 3x3 Scharr filter is used which gives better results than 3x3 Sobel filter. Gradient filters can also be applied using the cv2.Filter2D() function. The following are the 3x3 kernels corresponding to first x- and y- image derivatives: bare derivative without Gaussian smoothing, Sobel and Scharr.

sobel_scharr_filters.png

Second derivative filters: Laplacian¶

The Laplacian is the sum of the second partial derivatives. Opencv has the cv2.laplacian() function that implements the derivatives using the chain rule and Sobel filters.

dst= cv2.Laplacian(src, ddepth,[ksize[, scale[, delta[, borderType]]]])

If ksize == 1 (default), then following kernel is used for filtering:
K=[[0, 1, 0]
   [1,-4, 1]
   [0, 1, 1]]

Below code shows all operators in a single diagram. All kernels are 3x3. Depth of output image is passed to get the result in np.uint8 type.

In [2]:
import cv2
import numpy as np
from matplotlib import pyplot as plt

img = cv2.imread(edge_images_path+'hercules_tower.jpg',0)
laplacian = cv2.Laplacian(img, cv2.CV_8U,ksize=3)
sobelx = cv2.Sobel(img, cv2.CV_8U,1,0,ksize=3)
sobely = cv2.Sobel(img, cv2.CV_8U,0,1,ksize=3)

plt.figure(figsize=(0.05*img.shape[0],0.05*img.shape[1]))
plt.subplot(2,2,1),plt.imshow(img,cmap = 'gray')
plt.title('Original'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,2),plt.imshow(laplacian,cmap = 'gray')
plt.title('Laplacian'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,3),plt.imshow(sobelx,cmap = 'gray')
plt.title('Sobel X'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,4),plt.imshow(sobely,cmap = 'gray')
plt.title('Sobel Y'), plt.xticks([]), plt.yticks([])

plt.show()
No description has been provided for this image

IMPORTANT!

Output datatype can be cv2.CV_8U or np.uint8, but there is a slight problem with that. Black-to-White transition is taken as Positive slope (it has a positive value) while White-to-Black transition is taken as a Negative slope (It has negative value). So when you convert data to np.uint8, all negative slopes are made zero. In simple words, you miss that edge. If you want to detect both edges, better option is to keep the output datatype to some higher forms, like cv2.CV_64F, take its absolute value and then convert back to cv2.CV_8U. Below code demonstrates this procedure for a horizontal Sobel filter and difference in results.

In [3]:
import cv2
import numpy as np
from matplotlib import pyplot as plt

img = cv2.imread(edge_images_path+'hercules_tower.jpg',0)
laplacian = cv2.Laplacian(img, cv2.CV_8U,ksize=3)

# Output dtype = cv2.CV_8U
sobelx8u = cv2.Sobel(img,cv2.CV_8U,1,0,ksize=3)
sobely8u = cv2.Sobel(img,cv2.CV_8U,0,1,ksize=3)

# Output dtype = cv2.CV_64F. Then take its absolute and convert to cv2.CV_8U to visualze ite
sobelx64f = cv2.Sobel(img,cv2.CV_64F,1,0,ksize=3)
sobely64f = cv2.Sobel(img,cv2.CV_64F,0,1,ksize=3)
abs_sobelx64f = np.absolute(sobelx64f)
sobelx64f_to_8u = np.uint8(abs_sobelx64f)
abs_sobely64f = np.absolute(sobely64f)
sobely64f_to_8u = np.uint8(abs_sobely64f)

plt.figure(figsize=(0.05*img.shape[0],0.05*img.shape[1]))

plt.subplot(2,2,1),plt.imshow(sobelx8u,cmap = 'gray')
plt.title('Sobel X CV_8U'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,2),plt.imshow(sobelx64f_to_8u,cmap = 'gray')
plt.title('Sobel X abs(CV_64F)'), plt.xticks([]), plt.yticks([])

plt.subplot(2,2,3),plt.imshow(sobely8u,cmap = 'gray')
plt.title('Sobel Y CV_8U'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,4),plt.imshow(sobely64f_to_8u,cmap = 'gray')
plt.title('Sobel Y abs(CV_64F)'), plt.xticks([]), plt.yticks([])

plt.show()
No description has been provided for this image

Exercise Canny edge detector¶

https://docs.opencv.org/4.3.0/da/d22/tutorial_py_canny.html https://docs.opencv.org/3.4.2/da/d22/tutorial_py_canny.html

Canny algorithm was designed to meet the criteria of good detection, good location and single response. It uses multiple stages, as shown below. OpenCV implements it as the function cv2.Canny().

**edges = cv2.Canny(image, threshold1, threshold2[,edges [,apertureSize[, L2gradient]]])

  • First argument is the input image.
  • Second and third arguments are the thresholds. The former (lowest) is used for edge linking. While threshold2 (highest) is used to find initial segments of strong edges.
  • Argument apertureSize is the size of Sobel kernel used for computing image gradients. By default it is 3.
  • Last argument is L2gradient which specifies the equation for measuring gradient magnitude. If it is True, L2 norm is used, which is usually more accurate. By default, it is False and L1 norm is used.

canny_detector.png

You can find a detailed implementation of the Canny detector here: https://towardsdatascience.com/canny-edge-detection-step-by-step-in-python-computer-vision-b49c3a2d8123

In [4]:
import cv2
import numpy as np
from matplotlib import pyplot as plt

image = cv2.imread(edge_images_path+'hercules_tower.jpg',0)

img=cv2.GaussianBlur(image,(3,3),0) #some previous smothing is usually convenient
edges = cv2.Canny(img,5,15,apertureSize=3,L2gradient=True)

plt.figure(figsize=(0.05*img.shape[0],0.05*img.shape[1]))
plt.subplot(131),plt.imshow(image,cmap = 'gray')
plt.title('Original Image'), plt.xticks([]), plt.yticks([])
plt.subplot(132),plt.imshow(edges,cmap = 'gray')
plt.title('Edge Image s1'), plt.xticks([]), plt.yticks([])


img=cv2.GaussianBlur(image,(15,15),0) #some previous smothing is usually convenient
edges = cv2.Canny(img,5,15,apertureSize=3,L2gradient=True)

plt.subplot(133),plt.imshow(edges,cmap = 'gray')
plt.title('Edge Image s2'), plt.xticks([]), plt.yticks([])
plt.show()
No description has been provided for this image

Exercise¶

Select the best values for threshold1, threshold2, kernel size (or gaussian sigma), for each of the following images: chuvia.jpg, xardin.jpg, paseo.jpg.

  • kernel size and sigma are dependent values. Commonly, Gaussian distribution has 99.7% of its probability mass within $3*sigma$, so the size of the kernel should be $k\_size = 6.0*sqrt(2)*sigma$. Anyway $k\_size = 6*sigma$, or even shorter, is usually enough.

  • Specific implementation in OpenCV estimate the value of sigma (if 0 is passed) given the value of kernel size, but not the other way around. You can find the relationship between these two parameters (according to opencv) here:

    https://docs.opencv.org/2.4/modules/imgproc/doc/filtering.html#getgaussiankernel



EXERCISE 1¶

Compare the following two template matching alternatives regarding their performance : (i) case of using image and template gray-levels as features for matching, and (ii) case of using their edge maps as features. Use faces.jpg and two different templates right_eye.jpg and right_eye_i.jpg. You will have to compute their edge maps to perform the matching in case (ii).



In [5]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import os

# Note: This example is used by selecting only the best match.

# --- 1. PATH CONFIGURATION ---

# Construct the full path to the files
img_path = os.path.join(edge_images_path, 'faces.jpg')
tpl1_path = os.path.join(edge_images_path, 'right_eye.jpg')
tpl2_path = os.path.join(edge_images_path, 'right_eye_i.jpg')

# --- 2. IMAGE LOADING ---
# Load in grayscale (flag 0)
img = cv2.imread(img_path, 0)
tpl1 = cv2.imread(tpl1_path, 0)
tpl2 = cv2.imread(tpl2_path, 0)

# Safety check in case any path is incorrect
if img is None or tpl1 is None or tpl2 is None:
    print(f"Error loading images. Check that they exist in: {edge_images_path}")
else:
    # --- 3. VISUALIZATION HELPER FUNCTION ---
    def draw_match(original_img, template, title, ax):
        # a) Template Matching
        res = cv2.matchTemplate(original_img, template, cv2.TM_CCOEFF_NORMED)
        
        # b) Find the position of the best match
        _, _, _, max_loc = cv2.minMaxLoc(res)
        
        # c) Define the rectangle coordinates
        h, w = template.shape
        top_left = max_loc
        bottom_right = (top_left[0] + w, top_left[1] + h)
        
        # d) Convert to BGR to draw the rectangle in color (green)
        # If the image is already edges (binary/gray), convert to color so the box is visible
        img_display = cv2.cvtColor(original_img, cv2.COLOR_GRAY2RGB)
        cv2.rectangle(img_display, top_left, bottom_right, (0, 255, 0), 2)
        
        ax.imshow(img_display)
        ax.set_title(title)
        ax.axis('off')

    # --- 4. EXERCISE EXECUTION ---
    fig, axs = plt.subplots(2, 2, figsize=(12, 10))

    # CASE (i): GRAY-LEVELS
    draw_match(img, tpl1, "(i) Gray - Normal Template", axs[0, 0])
    draw_match(img, tpl2, "(i) Gray - Inverted Template", axs[0, 1])

    # CASE (ii): EDGE MAPS
    # Calculate edge maps using Canny. 
    # Thresholds 100 and 200 are standard, but can be adjusted.
    img_edges = cv2.Canny(img, 100, 200)
    tpl1_edges = cv2.Canny(tpl1, 100, 200)
    tpl2_edges = cv2.Canny(tpl2, 100, 200)

    draw_match(img_edges, tpl1_edges, "(ii) Edges - Normal Template", axs[1, 0])
    draw_match(img_edges, tpl2_edges, "(ii) Edges - Inverted Template", axs[1, 1])

    plt.tight_layout()
    plt.show()
No description has been provided for this image
In [6]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import os

# Note: Here, all matches above the thresholds are selected.
# Distinct thresholds are defined for Gray and Canny, selected after several experiments.

# --- 1. NMS FUNCTION (NON-MAXIMUM SUPPRESSION) ---
# This function removes overlapping rectangles to keep only one detection per object
def nms_fast(boxes, overlapThresh):
    # If there are no rectangles, return an empty list
    if len(boxes) == 0:
        return []

    # Convert to float for precise division
    if boxes.dtype.kind == "i":
        boxes = boxes.astype("float")

    # Initialize the list of picked indexes
    pick = []

    # Rectangle coordinates
    x1 = boxes[:,0]
    y1 = boxes[:,1]
    x2 = boxes[:,2]
    y2 = boxes[:,3]

    # Calculate the area of each rectangle
    area = (x2 - x1 + 1) * (y2 - y1 + 1)
    
    # Sort by the y2 coordinate
    idxs = np.argsort(y2)

    while len(idxs) > 0:
        # Take the last rectangle in the list and add it to "pick"
        last = len(idxs) - 1
        i = idxs[last]
        pick.append(i)

        # Find the intersection coordinates with the rest
        xx1 = np.maximum(x1[i], x1[idxs[:last]])
        yy1 = np.maximum(y1[i], y1[idxs[:last]])
        xx2 = np.minimum(x2[i], x2[idxs[:last]])
        yy2 = np.minimum(y2[i], y2[idxs[:last]])

        # Width and height of the intersection
        w = np.maximum(0, xx2 - xx1 + 1)
        h = np.maximum(0, yy2 - yy1 + 1)

        # Calculate the overlap ratio
        overlap = (w * h) / area[idxs[:last]]

        # Delete indexes from the list that overlap too much
        idxs = np.delete(idxs, np.concatenate(([last], np.where(overlap > overlapThresh)[0])))

    # Return the picked rectangles (converted back to integers)
    return boxes[pick].astype("int")

# --- 2. CONFIGURATION AND LOADING ---

img_path = os.path.join(edge_images_path, 'faces.jpg')
tpl1_path = os.path.join(edge_images_path, 'right_eye.jpg')
tpl2_path = os.path.join(edge_images_path, 'right_eye_i.jpg')

img = cv2.imread(img_path, 0)
tpl1 = cv2.imread(tpl1_path, 0)
tpl2 = cv2.imread(tpl2_path, 0)

if img is None or tpl1 is None or tpl2 is None:
    print("Error: Check the image paths.")
else:
    # --- 3. DETECTION AND DRAWING FUNCTION ---
    def detect_and_draw(original_img, template, title, ax, use_canny=False, threshold=0.6):
        
        # Image preparation (Canny or normal Gray)
        if use_canny:
            proc_img = cv2.Canny(original_img, 100, 200)
            proc_tpl = cv2.Canny(template, 100, 200)
        else:
            proc_img = original_img
            proc_tpl = template
            
        # Matching
        res = cv2.matchTemplate(proc_img, proc_tpl, cv2.TM_CCOEFF_NORMED)
        
        # Filter by threshold to detect multiple occurrences
        loc = np.where(res >= threshold)
        
        # Create list of rectangles [x1, y1, x2, y2]
        w, h = template.shape[::-1] # width, height
        rects = []
        for pt in zip(*loc[::-1]):
            rects.append([pt[0], pt[1], pt[0] + w, pt[1] + h])
        
        rects = np.array(rects)
        
        # Apply NMS (Non-Maximum Suppression)
        pick = nms_fast(rects, 0.3)
        
        
        # If using Canny, show the edge image (proc_img) converted to color
        # Otherwise, show the original gray image converted to color
        
        if use_canny:
            img_display = cv2.cvtColor(proc_img, cv2.COLOR_GRAY2BGR)
        else:
            img_display = cv2.cvtColor(original_img, cv2.COLOR_GRAY2BGR)
        
        for (startX, startY, endX, endY) in pick:
            cv2.rectangle(img_display, (startX, startY), (endX, endY), (0, 255, 0), 2)
        
        ax.imshow(img_display)
        ax.set_title(f"{title}\nDetections: {len(pick)}")
        ax.axis('off')

    # --- 4. EXECUTION (COMPARISON) ---
    fig, axs = plt.subplots(2, 2, figsize=(12, 10))

    # Adjust this threshold if there is too much noise or eyes are missing.
    THRESH_GRAY = 0.55
    THRESH_CANNY = 0.095

    # Row 1: Gray Levels (Background: Original Photo)
    detect_and_draw(img, tpl1, "(i) Gray - Normal Template", axs[0, 0], use_canny=False, threshold=THRESH_GRAY)
    detect_and_draw(img, tpl2, "(i) Gray - Inverted Template", axs[0, 1], use_canny=False, threshold=THRESH_GRAY)

    # Row 2: Edge Maps (Background: Black Edges)
    detect_and_draw(img, tpl1, "(ii) Edges - Normal Template", axs[1, 0], use_canny=True, threshold=THRESH_CANNY)
    detect_and_draw(img, tpl2, "(ii) Edges - Inverted Template", axs[1, 1], use_canny=True, threshold=THRESH_CANNY)

    plt.tight_layout()
    plt.show()
No description has been provided for this image

Image blending using Gaussian and Laplacian Pyramids¶

Gaussian Pyramids¶

This type of pyramid is constructed by repeatedly applying a Gaussian blur filter to an image and downsampling it by a factor of two. The resulting images are smoother and have lower resolution than the original image because Gaussians are low pass filters.

Laplacian Pyramid¶

A Laplacian pyramid is very similar to a Gaussian pyramid, but saves the difference image of the blurred versions between each levels (Difference of Gaussians approximates Laplacian of Gaussian operator). The resulting images are band-pass filtered versions of the original image, which highlight details and edges at different levels of resolution. Only the smallest level is not a difference image.

No description has been provided for this image No description has been provided for this image

Image Blending¶

  • Choose img1 and img2 and crop/resize to the same size
  • Chose a region mask of the same size
  • Create Gaussian pyramid for img1 and img2
  • Create Laplacian pyramids from Gaussian pyramids
  • Create Gaussian pyramid for the region mask
  • Blend the two Laplacian pyramids using the mask’s Gaussian pyramid to weight the two images at each level of the pyramid
  • Collapse the resulting Laplacian pyramid to reveal the blended image.
In [7]:
from matplotlib import pyplot as plt
import cv2
import numpy as np
import os

# Note: The original code has been modified to solve the normalization problem.

# --- 0. CONFIGURATION ---
# Ensure this path is correct
edge_images_path = './Images/Edge_images/'

def imshow2(im_title, im):
    ''' Helper function to display images '''
    plt.figure(figsize=(6,6))
    plt.title(im_title)
    plt.axis("off")
    
    # If the image is float and has values > 1, Matplotlib will display it as white.
    # Here we ensure it looks correct by clipping if necessary, 
    # but ideally, the image should be passed already normalized (0-1).
    if im.dtype == np.float32 or im.dtype == np.float64:
        # Optional: Clip to avoid warnings, although it is better to normalize outside
        im = np.clip(im, 0, 1) 

    if len(im.shape) == 2:
        plt.imshow(im, cmap="gray")
    else:
        # Matplotlib uses RGB, OpenCV uses BGR
        im_display = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
        plt.imshow(im_display)
    plt.show()

# --- 1. IMAGE LOADING ---
# Load and convert to float32. We do NOT divide by 255 yet to perform mathematical operations 
# exactly as in the original example, but we will divide by 255 ONLY when displaying.
im_sides = cv2.imread(os.path.join(edge_images_path, 'paseo.jpg')).astype(np.float32)
im_center = cv2.imread(os.path.join(edge_images_path, 'road2.jpg')).astype(np.float32)

# Resize to a power of 2 (512x512) facilitates pyramids
im_sides = cv2.resize(im_sides, (512, 512))
im_center = cv2.resize(im_center, (512, 512))

# HERE WE DIVIDE BY 255 FOR DISPLAY
imshow2("Sides Image (Original)", im_sides / 255.0)
imshow2("Center Image (Original)", im_center / 255.0)

# --- 2. MASK CREATION ---
cols = 512
im_mask = np.ones((512, 512, 3), np.float32)
# Define the central zone as 0 (so the 'center' image goes there)
im_mask[:, int(cols/4):int(3*cols/4)] = 0

# Visualization of a simple "cut and paste"
naive_blend = im_mask * im_sides + (1 - im_mask) * im_center
imshow2("Blending without Pyramids (Hard cut)", naive_blend / 255.0)


# --- 3. GAUSSIAN PYRAMIDS ---
gauss_levels = 6
gauss_reduce_sides = im_sides.copy()
gauss_pyr_sides = [gauss_reduce_sides]

gauss_reduce_center = im_center.copy()
gauss_pyr_center = [gauss_reduce_center]

gauss_reduce_mask = im_mask.copy()
gauss_pyr_mask = [gauss_reduce_mask]

for i in range(gauss_levels):
    gauss_reduce_sides = cv2.pyrDown(gauss_reduce_sides)
    gauss_pyr_sides.append(gauss_reduce_sides)
    
    gauss_reduce_center = cv2.pyrDown(gauss_reduce_center)
    gauss_pyr_center.append(gauss_reduce_center)
    
    gauss_reduce_mask = cv2.pyrDown(gauss_reduce_mask)
    gauss_pyr_mask.append(gauss_reduce_mask)


# --- 4. LAPLACIAN PYRAMIDS ---
lap_pyr_sides = []
lap_pyr_center = []

for i in range(gauss_levels):
    source = gauss_pyr_sides[i]
    expanded = cv2.pyrUp(gauss_pyr_sides[i+1])
    
    # Safety adjustment in case sizes do not match after pyrUp
    h, w = source.shape[:2]
    if expanded.shape[:2] != (h, w):
        expanded = cv2.resize(expanded, (w, h))
        
    laplacian = cv2.subtract(source, expanded)
    lap_pyr_sides.append(laplacian)
    
    # Same for the center image
    source_c = gauss_pyr_center[i]
    expanded_c = cv2.pyrUp(gauss_pyr_center[i+1])
    if expanded_c.shape[:2] != (h, w):
        expanded_c = cv2.resize(expanded_c, (w, h))
        
    laplacian_c = cv2.subtract(source_c, expanded_c)
    lap_pyr_center.append(laplacian_c)


# --- 5. LEVEL VISUALIZATION (CORRECTED) ---
# Here was your white image problem.

print("Showing Gaussian Pyramid (Levels)...")
for index, image in enumerate(gauss_pyr_sides):
    # CORRECTION: Divide by 255.0 so Matplotlib understands it
    imshow2(f"Gaussian Level {index}", image / 255.0)

print("Showing Laplacian Pyramid (Details)...")
for index, image in enumerate(lap_pyr_sides):
    # CORRECTION: The Laplacian has negative and positive values.
    # To see it, we take the absolute value and normalize slightly exaggerated (x4) to see dark details
    vis_lap = np.abs(image) * 4 / 255.0
    imshow2(f"Laplacian Level {index}", vis_lap)


# --- 6. PYRAMID BLENDING ---
com_lap = []
# Use zip to iterate through all 3 lists at once
for sides, center, mask in zip(lap_pyr_sides, lap_pyr_center, gauss_pyr_mask):
    # Blending formula: L_final = L_sides * mask + L_center * (1 - mask)
    mixed = sides * mask + center * (1 - mask)
    com_lap.append(mixed)


# --- 7. RECONSTRUCTION ---
# Start at the top of the pyramid (the smallest Gaussian image)
# Note: Use the last Gaussian level as the base
last_level_idx = gauss_levels
base_image = gauss_pyr_sides[last_level_idx] * gauss_pyr_mask[last_level_idx] + \
             gauss_pyr_center[last_level_idx] * (1 - gauss_pyr_mask[last_level_idx])

im_blended = base_image

# Iterate backwards to add details
for i in range(len(com_lap)-1, -1, -1):
    h, w = com_lap[i].shape[:2]
    im_blended = cv2.pyrUp(im_blended)
    
    # Size safety adjustment
    if im_blended.shape[:2] != (h, w):
        im_blended = cv2.resize(im_blended, (w, h))
        
    im_blended = cv2.add(im_blended, com_lap[i])

# Final clip to ensure valid range
im_blended = np.clip(im_blended, 0, 255)

# --- 8. FINAL RESULT ---
imshow2("Final Result: Pyramid Blending", im_blended / 255.0)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
Showing Gaussian Pyramid (Levels)...
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
Showing Laplacian Pyramid (Details)...
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [8]:
# Combine the images
com_lap = [];
for sides, center, mask in zip(lap_pyr_sides, lap_pyr_center, gauss_pyr_mask):
    print(sides.shape)
    sides=mask*sides
    center=(1-mask)*center
    
    com_lap.append(sides+center)
    
# Display the combined images
# Uncomment to see the Laplacian Pyramid

# for index, image in enumerate(com_lap):
#    imshow2("Level {}".format(index), image) 

print("Lowest Laplacian level {}".format(lap_pyr_sides[-1].shape))
print("Lowest Gaussian level {}".format(gauss_pyr_mask[-1].shape))

# Now reconstruct
cols = gauss_pyr_sides[-2].shape[1]
im_blended=gauss_pyr_mask[-1]*gauss_pyr_sides[-1]+ (1-gauss_pyr_mask[-1])*gauss_pyr_center[-1]

for i in range(len(com_lap)-1, -1, -1):
    h, w = com_lap[i].shape[:2];
    im_blended = cv2.pyrUp(im_blended, dstsize=(w, h))
    im_blended = cv2.add(im_blended, com_lap[i])

# Display the Images
cols=im_sides.shape[1]

imshow2("Without Blending ",(im_mask*im_sides+ (1-im_mask)*im_center)/255)
imshow2("With Blending", im_blended/255)
(512, 512, 3)
(256, 256, 3)
(128, 128, 3)
(64, 64, 3)
(32, 32, 3)
(16, 16, 3)
Lowest Laplacian level (16, 16, 3)
Lowest Gaussian level (8, 8, 3)
No description has been provided for this image
No description has been provided for this image

Image Fusion¶

Differently exposed images look very different in terms of brightness as well as details on the images. As the image went underexposed, the image becomes dark and lose details on the darker region, whereas the image becomes too bright and also lose details on the brighter region as become overexposed. To capture all details on the image, they needs to be improved by combining them.

Usig a multiresolution blending, by constructing Gaussian and Laplacian image pyramids, a improved image can be built from images of the same scene (and camera pose) attending to different quality measures, for instance saturation. The Saturation measures the degree of the exposure. A longer exposed image contains saturated colors, which will eventually be clipped off. Saturation can be measured at pìxel level as teh standar deviation of R, G, and B at each pixel.

Image fusion can be obtained from the two image laplacian pyramids, which will be combined according to the weights provided by the gaussian pyramid of the saturation map.



EXERCISE 2¶

Write the code for the fusion of these two images landscape_under.png and landscape_over.png, according to the measure of saturation at pixel level im both images.



In [9]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import os

# --- 1. PATHS ---

path_under = os.path.join(intro_images_path, 'landscape_under.png')
path_over = os.path.join(intro_images_path, 'landscape_over.png')

# --- 2. LOADING AND PREPARATION ---
# Read in color and convert to float32 (necessary for precise mathematical operations)
img1 = cv2.imread(path_under).astype('float32') / 255.0
img2 = cv2.imread(path_over).astype('float32') / 255.0

if img1 is None or img2 is None:
    print("Error: Images not found.")
else:
    # --- 3. WEIGHT MAP CALCULATION (SATURATION) ---
    # The statement says: "standard deviation of R, G, and B at each pixel"
    # axis=2 calculates the deviation across color channels
    sat1 = np.std(img1, axis=2)
    sat2 = np.std(img2, axis=2)

    # Normalize weights so they sum to 1 at each pixel
    # Add an epsilon (1e-12) to avoid division by zero
    denom = sat1 + sat2 + 1e-12
    w1 = sat1 / denom
    w2 = sat2 / denom

    # --- 4. PYRAMID FUNCTIONS ---
    def gaussian_pyramid(img, levels):
        gp = [img]
        for i in range(levels):
            img = cv2.pyrDown(img)
            gp.append(img)
        return gp

    def laplacian_pyramid(img, levels):
        gp = gaussian_pyramid(img, levels)
        lp = [gp[levels]] # The last Gaussian level is the tip of the Laplacian pyramid
        for i in range(levels, 0, -1):
            gaussian_expanded = cv2.pyrUp(gp[i])
            
            # SIZE ADJUSTMENT: Sometimes pyrUp does not exactly match the previous level
            # if the original image is not a power of 2. Adjust size:
            target_h, target_w = gp[i-1].shape[:2]
            if gaussian_expanded.shape[:2] != (target_h, target_w):
                gaussian_expanded = cv2.resize(gaussian_expanded, (target_w, target_h))
            
            laplacian = cv2.subtract(gp[i-1], gaussian_expanded)
            lp.append(laplacian)
        return lp

    # --- 5. FUSION PROCESS ---
    levels = 6 # Number of pyramid levels (depth)

    # A) Build Laplacian pyramids for the original IMAGES
    lp1 = laplacian_pyramid(img1, levels)
    lp2 = laplacian_pyramid(img2, levels)

    # B) Build Gaussian pyramids for the WEIGHT MASKS
    # (Masks must be smooth, so we use Gaussian)
    gp_w1 = gaussian_pyramid(w1, levels)
    gp_w2 = gaussian_pyramid(w2, levels)

    # C) Blend level by level
    fused_pyramid = []
    for l1, l2, gw1, gw2 in zip(lp1, lp2, gp_w1[::-1], gp_w2[::-1]):
        # Convert weights to 3 channels to multiply with the RGB image
        rows, cols = l1.shape[:2]
        gw1_3c = np.dstack([gw1]*3)
        gw2_3c = np.dstack([gw2]*3)
        
        # Ensure sizes (for safety against pyramid rounding)
        gw1_3c = cv2.resize(gw1_3c, (cols, rows))
        gw2_3c = cv2.resize(gw2_3c, (cols, rows))

        # Fusion formula: L_final = L1*W1 + L2*W2
        fused_layer = l1 * gw1_3c + l2 * gw2_3c
        fused_pyramid.append(fused_layer)

    # --- 6. IMAGE RECONSTRUCTION ---
    ls_ = fused_pyramid[0] # Start at the tip (smallest image)
    for i in range(1, len(fused_pyramid)):
        ls_ = cv2.pyrUp(ls_)
        
        # Size adjustment again to match the next level
        target_h, target_w = fused_pyramid[i].shape[:2]
        if ls_.shape[:2] != (target_h, target_w):
            ls_ = cv2.resize(ls_, (target_w, target_h))
            
        ls_ = cv2.add(ls_, fused_pyramid[i])

    # Clipping to ensure we are between 0 and 1
    result = np.clip(ls_, 0, 1)

    # --- 7. VISUALIZATION ---
    plt.figure(figsize=(15, 5))
    
    plt.subplot(1, 3, 1)
    plt.imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
    plt.title("Underexposed (Under)")
    plt.axis('off')
    
    plt.subplot(1, 3, 2)
    plt.imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
    plt.title("Overexposed (Over)")
    plt.axis('off')
    
    plt.subplot(1, 3, 3)
    plt.imshow(cv2.cvtColor(result, cv2.COLOR_BGR2RGB))
    plt.title("Fusion Result (Balanced Exposure)")
    plt.axis('off')
    
    plt.tight_layout()
    plt.show()
No description has been provided for this image


Exercise Hough Tranform¶

https://docs.opencv.org/3.4.2/d6/d10/tutorial_py_houghlines.html

A line in image space can be expressed with two parameters:

  • In the Cartesian coordinate system: (m,b).
  • In the Polar coordinate system: (r,θ)

For Hough Transforms, we will express lines in the Polar system. Hence, a line equation can be written as r=xcosθ+ysinθ. In general for each point (x0,y0), we can define the family of lines that goes through that point as r=x0cosθ+y0sinθ. Meaning that each pair (r,θ) represents each line that passes by (x0,y0).

Hough Transform is encapsulated in the OpenCV function cv2.HoughLines().

lines=cv.HoughLines(edges, rho, theta, threshold [, srn[, stn[, min_theta[,max_theta]]]]])

  • It simply returns an array of (r,θ) values; r is measured in pixels and θ is measured in radians.
  • First parameter, edges image should be a binary image, so apply threshold or use canny edge detection before applying Hough transform.
  • Second and third parameters are rho and theta accuracies, respectively.
  • Fourth argument is the threshold, which means minimum votes it should get for it to be considered as a line. Remember, number of votes depend upon number of points detected on the line.
In [10]:
import cv2
import numpy as np
import math
from matplotlib import pyplot as plt

img = cv2.imread(edge_images_path+'building.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray=img
imgP=img.copy()

gray = cv2.GaussianBlur(img,(7,7),0)
edges = cv2.Canny(gray,1,50,apertureSize=3,L2gradient=True)
lines = cv2.HoughLines(edges,1,np.pi/180,120)

#Draw all the lines
if lines is not None:
    for i in range(0,len(lines)):
        rho = lines[i][0][0]
        theta = lines[i][0][1]
        a = np.cos(theta)
        b = np.sin(theta)
        x0 = a * rho
        y0 = b * rho
        pt1 = (int(x0 + 1000*(-b)), int(y0 + 1000*(a)))
        pt2 = (int(x0 - 1000*(-b)), int(y0 - 1000*(a)))
        cv2.line(imgP, pt1, pt2, (0,0,255), 2)

plt.imshow(edges,cmap='gray'), plt.xticks([]), plt.yticks([])
plt.show()
plt.imshow(imgP,cmap='gray'), plt.xticks([]), plt.yticks([])
plt.show()
No description has been provided for this image
No description has been provided for this image

A more efficient implementation of the Hough Line Transform is the function HoughLinesP(). It returns the extremes of detected lines (x0,y0,x1,y1). This function has two extra parameters:

  • MinLineLength: Minimum line length. Line segments shorter than it are rejected.
  • MaxLineGap: Maximum allowed gap between points on the same line to link them.
In [11]:
linesP = cv2.HoughLinesP(edges, 1, np.pi/180, 50, minLineLength=50, maxLineGap=10)

imgP = img.copy()


if linesP is not None:
    for i in range(0, len(linesP)):
        l = linesP[i][0]
        cv2.line(imgP, (l[0], l[1]), (l[2], l[3]), (255, 0, 0), 3, cv2.LINE_AA)

plt.imshow(imgP)
plt.show()
No description has been provided for this image


EXERCISE 3¶

Consider the image naval.jpg. Try to detect lines with the best trade-off between the numbers of True Positive lines and False Negative ones. Consider different levels of smoothing when applying Canny detector.



In [12]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import os

# --- 1. IMAGE LOADING ---
edge_images_path = './Images/Edge_images/'
img_path = os.path.join(edge_images_path, 'naval.jpg')

img = cv2.imread(img_path)

if img is None:
    print(f"Error: Image not found at {img_path}")
else:
    # Convert to gray once
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # --- 2. FUNCTION TO TEST DIFFERENT BLUR LEVELS ---
    def process_lines(title, kernel_size_blur):
        # A) Smoothing (Gaussian Blur)
        # If the kernel is (1,1) or 0, there is no effective smoothing.
        if kernel_size_blur > 1:
            blurred = cv2.GaussianBlur(gray, (kernel_size_blur, kernel_size_blur), 0)
        else:
            blurred = gray

        # B) Canny Edge Detector
        # Thresholds 50 and 150 are standard, but work reasonably well for naval.jpg
        edges = cv2.Canny(blurred, 50, 150, apertureSize=3)

        # C) Probabilistic Hough Transform
        # Adjust parameters to look for long lines (ship structure/horizon)
        # threshold=50: Minimum votes.
        # minLineLength=50: Minimum length to be a line.
        # maxLineGap=10: Maximum allowed gap to join segments.
        lines = cv2.HoughLinesP(edges, 1, np.pi/180, threshold=50, minLineLength=50, maxLineGap=10)

        # D) Draw
        img_display = img.copy()
        num_lines = 0
        if lines is not None:
            num_lines = len(lines)
            for line in lines:
                x1, y1, x2, y2 = line[0]
                cv2.line(img_display, (x1, y1), (x2, y2), (0, 255, 0), 2)
        
        return edges, img_display, num_lines

    # --- 3. COMPARATIVE EXPERIMENT ---
    
    # CASE 1: Low smoothing (Kernel 3x3) -> High detail, possible noise
    edges1, res1, n1 = process_lines("Low Smoothing (3x3)", 3)
    
    # CASE 2: Medium Smoothing (Kernel 5x5) -> Balanced
    edges2, res2, n2 = process_lines("Medium Smoothing (5x5)", 5)
    
    # CASE 3: High Smoothing (Kernel 9x9) -> Very clean, loses fine details
    edges3, res3, n3 = process_lines("High Smoothing (9x9)", 9)

    # --- 4. VISUALIZATION ---
    plt.figure(figsize=(12, 12))

    # Row 1: Low Smoothing
    plt.subplot(3, 2, 1)
    plt.imshow(edges1, cmap='gray')
    plt.title(f"Edges (Blur 3x3)")
    plt.axis('off')
    
    plt.subplot(3, 2, 2)
    plt.imshow(cv2.cvtColor(res1, cv2.COLOR_BGR2RGB))
    plt.title(f"Lines: {n1} (Too much noise in water?)")
    plt.axis('off')

    # Row 2: Medium Smoothing (Best usual compromise)
    plt.subplot(3, 2, 3)
    plt.imshow(edges2, cmap='gray')
    plt.title(f"Edges (Blur 5x5)")
    plt.axis('off')
    
    plt.subplot(3, 2, 4)
    plt.imshow(cv2.cvtColor(res2, cv2.COLOR_BGR2RGB))
    plt.title(f"Lines: {n2} (Balanced)")
    plt.axis('off')

    # Row 3: High Smoothing
    plt.subplot(3, 2, 5)
    plt.imshow(edges3, cmap='gray')
    plt.title(f"Edges (Blur 9x9)")
    plt.axis('off')
    
    plt.subplot(3, 2, 6)
    plt.imshow(cv2.cvtColor(res3, cv2.COLOR_BGR2RGB))
    plt.title(f"Lines: {n3} (Masts/details lost)")
    plt.axis('off')

    plt.tight_layout()
    plt.show()
No description has been provided for this image

Exercise: Hough Transform applied to circles¶

https://docs.opencv.org/4.x/da/d53/tutorial_py_houghcircles.html

A circle is represented mathematically by its center (xcenter,ycenter) and its radius r. As it has 3 parameters, we need a 3D accumulator for hough transform. OpenCV offers the function cv2.HoughCircles().

    circles= cv2.HoughCircles(image, method, dp, minDist[, param1[, param2[, minRadius[, maxRadius]]]])

  • image 8-bit, single-channel, grayscale input image.
  • circles Output vector of found circles. Each vector is encoded as 3 or 4 element floating-point vector (x,y,radius) or (x,y,radius,votes).
  • method Detection method. The available methods are HOUGH_GRADIENT and HOUGH_GRADIENT_ALT.
  • dp Inverse ratio of the accumulator resolution to the image resolution: if dp=1 the accumulator has the same resolution as the input image; if dp=2 the accumulator has half as big width and height.
  • minDist Minimum distance between the centers of the detected circles.
  • param1 Higher threshold of the two passed to the Canny edge detector (the lower one is twice smaller).
  • param2 Accumulator threshold for the circle centers at the detection stage.
  • minRadius Minimum circle radius.
  • maxRadius Maximum circle radius.
In [13]:
import numpy as np
import cv2 
from matplotlib import pyplot as plt
import os


filename = 'spec2.tif' 
img_path = os.path.join(edge_images_path, filename)

img = cv2.imread(img_path, 0)

if img is None:
    print(f"ERROR: File '{filename}' not found in path: {edge_images_path}")
    print("Check that the filename is correct and it exists in that folder.")
else:
    # Apply blur to reduce noise
    img_blur = cv2.GaussianBlur(img, (9, 9), 2)

    # Convert to color to draw green/red circles on top
    cimg = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)

    # Detect circles
    circles = cv2.HoughCircles(img_blur, cv2.HOUGH_GRADIENT, dp=2, minDist=20,
                               param1=40, param2=30, minRadius=10, maxRadius=20)

    if circles is not None:
        circles = np.uint16(np.around(circles))
        for i in circles[0, :]:
            # Draw the outer circle (green)
            cv2.circle(cimg, (i[0], i[1]), i[2], (0, 255, 0), 2)
            # Draw the center of the circle (red)
            cv2.circle(cimg, (i[0], i[1]), 2, (0, 0, 255), 3)

        plt.imshow(cimg)
        plt.title('Detected Circles')
        plt.xticks([]), plt.yticks([])
        plt.show()
    else:
        print("No circles were detected with these parameters.")
No description has been provided for this image


EXERCISE 4¶

Implement a program for the estimation of the number of fishes in fishes.jpg image, according to the number of detected eyes. Choose the most convenient parameters.



In [14]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
import os

# Note: Different experiments and parameters have been tested, and the most effective one has been kept.

# --- 1. CONFIGURATION ---
edge_images_path = './Images/Edge_images/'
img_path = os.path.join(edge_images_path, 'fishes.jpg')

img = cv2.imread(img_path)

if img is None:
    print(f"Error: Could not load image at {img_path}")
else:
    output_img = img.copy()
    
    # --- 2. PREPROCESSING ---
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    # Keep smoothing at 5, it is a good compromise
    gray_blurred = cv2.medianBlur(gray, 5)

    # --- 3. CIRCLE DETECTION ---
    # MIDPOINT ADJUSTMENTS:
    # - param2=28: The key value. 30 was too strict, 20 too noisy.
    # - minDist=28: Avoid grouping too close.
    # - minRadius=4: Filter smallest noise without losing distant eyes.
    
    circles = cv2.HoughCircles(gray_blurred, 
                               cv2.HOUGH_GRADIENT, 
                               dp=1, 
                               minDist=28,       
                               param1=50,        
                               param2=28,        # MIDPOINT (Between 20 and 30)
                               minRadius=4,      # Slightly larger than before to avoid tiny specks
                               maxRadius=30)

    # --- 4. DRAWING AND COUNTING ---
    count = 0
    if circles is not None:
        circles = np.uint16(np.around(circles))
        
        for i in circles[0, :]:
            center = (i[0], i[1])
            radius = i[2]
            
            # Draw outer circle (Green)
            cv2.circle(output_img, center, radius, (0, 255, 0), 2)
            # Draw center (Red)
            cv2.circle(output_img, center, 2, (0, 0, 255), 3)
            count += 1
            
        print(f"Estimation: {count} fish (eyes) detected.")
    else:
        print("No fish detected. Try slightly lowering 'param2'.")

    # --- 5. VISUALIZATION ---
    plt.figure(figsize=(12, 10))
    plt.imshow(cv2.cvtColor(output_img, cv2.COLOR_BGR2RGB))
    plt.title(f'Exercise 4: Fish Estimation ({count} detected)')
    plt.axis('off')
    plt.show()
Estimation: 12 fish (eyes) detected.
No description has been provided for this image